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Optimal  replacement  policies  for  non-uniform 
cache  objects  with  optional  eviction 

Omri  Bahat,  Armand  M.  Makowski 


Abstract — Replacement  policies  for  general  caching  ap¬ 
plications  and  Web  caching  in  particular  have  been  exten¬ 
sively  addressed  in  the  literature.  Many  policies  that  focus 
on  document  costs,  size,  probability  of  references  and  tem¬ 
poral  locality  of  requested  documents,  have  been  proposed. 
In  many  cases  these  policies  are  ad-hoc  attempts  to  take 
advantage  of  the  statistical  information  contained  in  the 
stream  of  requests,  and  to  address  the  factors  above.  How¬ 
ever,  since  the  introduction  of  optimal  replacement  policies 
for  conventional  caching,  the  problem  of  finding  optimal 
replacement  policies  under  the  factors  indicated  has  not 
been  studied  in  any  systematic  manner.  In  this  paper,  we 
take  a  step  in  that  direction:  We  first  show,  still  under  the 
Independent  Reference  Model,  that  a  simple  Markov  sta¬ 
tionary  replacement  policy,  called  the  policy  C0,  minimizes 
the  long-run  average  metric  induced  by  non-uniform  docu¬ 
ment  costs  when  document  eviction  is  optional.  We  then  use 
these  results  to  propose  a  framework  to  operate  caching  sys¬ 
tems  with  multiple  performance  metrics.  We  do  so  by  solv¬ 
ing  a  constrained  caching  problem  with  a  single  constraint. 
The  resulting  constrained  optimal  replacement  policy  is  ob¬ 
tained  by  simple  randomization  between  two  Markov  sta¬ 
tionary  optimal  replacement  policies  C0  but  induced  by  dif¬ 
ferent  costs. 

Index  Terms —  Web  caching,  Optimal  replacement  poli¬ 
cies,  Non-uniform  cost,  Independent  Reference  Model, 
Caching  under  a  constraint,  Markov  decision  processes. 

I.  Introduction 

EB  caching  aims  to  reduce  network  traffic,  server 
load  and  user-perceived  retrieval  latency  by  repli¬ 
cating  “popular’  content  on  proxy  caches  that  are  strate¬ 
gically  placed  within  the  network.  Key  to  the  effective¬ 
ness  of  such  proxy  caches  is  the  implementation  of  doc¬ 
ument  replacement  algorithms  that  can  yield  high  hit  ra- 
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expressed  in  this  material  are  those  of  the  authors  and  do  not  necessar¬ 
ily  reflect  the  views  of  the  Space  and  Naval  Warfare  Systems  Center  - 
San  Diego. 


tios.  A  large  number  of  techniques  for  file  caching  and 
virtual  memory  replacement  have  been  developed  [1]  [4], 
but  unfortunately  they  do  not  necessarily  transfer  to  Web 
caching  as  explained  below.  Despite  the  ever  decreasing 
prices  of  storage  devices,  the  optimization  or  fine  tun¬ 
ing  of  cache  replacement  policies  is  not  a  moot  point  for 
the  benefits  of  even  slight  improvements  in  cache  perfor¬ 
mance  can  have  an  appreciable  effect  on  network  traffic, 
especially  when  such  gains  are  compounded  through  a  hi¬ 
erarchy  of  caches. 

In  the  context  of  conventional  caching  the  underly¬ 
ing  working  assumption  is  the  so-called  Independent 
Reference  Model,  whereby  document  requests  are  as¬ 
sumed  to  form  an  i.i.d.  sequence.  It  has  been  known 
for  some  time  [1]  [4]  that  the  miss  rate  (equivalently, 
the  hit  rate)  is  minimized  (equivalently,  maximized)  by 
the  so-called  policy  Aq  according  to  which  a  document 
is  evicted  from  the  cache  if  it  has  the  smallest  prob¬ 
ability  of  occurence  (equivalently,  is  the  least  popular) 
among  the  documents  in  the  cache.  More  precisely,  let 
doc(  1), . . .  ,  doc(N)  denote  the  set  of  documents  to  be  re¬ 
quested  and  let  p(j)  denote  the  probability  of  reference 
of  doc(j)  ( j  =  1 .....  iV ) .  When  the  cache  is  full  and  the 
requested  document  is  not  in  the  cache,  Aq  prescribes 

Evict  doc(i) 

if  i  =  arg  min  (p(j)  :  doc(j)  in  cache) .  (1) 

In  practice,  the  popularity  vector  p  =  (p(l), . . .  ,p{N))  is 
not  available  and  thus  needs  to  be  estimated  on-line  as  re¬ 
quests  arc  coming  in.  This  naturally  gives  rise  to  the  LFU 
(Least  Frequently  Used)  policy  which  mimics  Aq  through 
the  Certainty  Equivalence  Principle:  When  the  cache  is 
full  and  the  Kth  requested  document  is  not  in  the  cache, 
LFU  prescribes 

Evict  doc(i) 

if  i  =  arg  min  (p k  (j )  ■  doc(j)  in  cache)  (2) 

where  pnij)  is  the  frequency  estimate  of  p(j)  based  on 
the  trace  measurements  up  to  the  K11'  request.  The  focus 
on  miss  and  hit  rates  as  performance  criteria  is  reflective 


of  the  fact  that  historically,  pages  in  memory  systems  were 
of  equal  size,  and  transfer  times  of  pages  from  the  primary 
storage  to  the  cache  were  nearly  constant  over  time  and 
independent  of  the  document  transferred. 

Interestingly  enough,  even  in  this  restricted  context,  the 
popularity  information  as  derived  from  the  relative  access 
frequencies  of  objects  requested  through  the  cache,  is  sel¬ 
dom  maintained  and  is  rarely  used  directly  in  the  design 
of  cache  replacement  policies.  This  is  so  because  of  the 
difficulty  to  capture  this  information  in  an  on-line  fashion 
in  contrast  with  other  attributes  of  the  request  stream,  said 
attributes  being  thought  indicative  of  the  future  popularity 
of  the  object.  Typical  examples  include  temporal  locality 
via  the  recency  of  access  and  object  size  which  lead  very 
naturally  to  the  Least-Recently-Used  (LRU)  and  Largest- 
File-First  (LFF)  replacement  policies,  respectively. 

At  this  point  it  is  worth  stressing  the  three  pri¬ 
mary  differences  between  Web  caching  and  conventional 
caching: 

1)  Web  objects  or  documents  arc  of  variable  size 
whereas  conventional  caching  handles  fixed-size 
documents  or  pages.  Neither  the  policy  /Iq  nor  the 
LRU  policy  (nor  many  other  policies  proposed  in 
the  literature  on  conventional  caching)  account  for 
the  variable  size  of  documents; 

2)  The  miss  penalty  or  retrieval  cost  of  missed  docu¬ 
ments  from  the  server  to  the  proxy  can  vary  signifi¬ 
cantly  over  time  and  per  each  document.  In  fact,  the 
cost  value  may  not  be  known  in  advance  and  must 
sometimes  be  estimated  on-line  before  a  decision  is 
taken.  For  instance,  the  download  time  of  a  Web 
page  depends  on  the  size  of  the  document  to  be  re¬ 
trieved,  on  the  available  bandwidth  from  the  server 
to  the  cache,  and  on  the  route  used.  These  factors 
may  vary  over  time  due  to  changing  network  condi¬ 
tions  (e.g.,  link  failure  or  network  overload); 

3)  Access  streams  seen  by  the  proxy  cache  arc  the 
union  of  Web  access  streams  from  tens  to  thousands 
of  users,  instead  of  coming  from  a  few  programmed 
sources  as  is  the  case  in  virtual  memory  paging,  so 
the  Independent  Reference  Model  is  not  likely  to 
provide  a  good  tit  to  Web  traces.  In  fact,  Web  traf¬ 
fic  patterns  were  found  to  exhibit  temporal  locality 
(i.e.,  temporal  correlations)  in  that  recently  accessed 
objects  arc  more  likely  to  be  accessed  in  the  near  fu¬ 
ture.  To  complicate  matters,  the  popularity  of  Web 
objects  was  found  to  be  highly  variable  (i.e.,  bursty) 
over  short  time  scales  but  much  smoother  over  long 
time  scales. 

These  differences,  namely  variable  size,  variable  cost 
and  the  more  complex  statistics  of  request  patterns,  pre¬ 


clude  an  easy  transfer  of  caching  techniques  developed 
earlier  for  computer  system  memory.  Yet,  a  large  number 
of  studies  have  focused  on  the  design  of  efficient  replace¬ 
ment  policies;  see  [6]  [7]  [8]  [9]  and  references  therein 
for  a  sample  literature.  Proposed  policies  typically  ex¬ 
ploit  either  access  recency  (e.g.,  the  LRU  policy)  or  ac¬ 
cess  frequency  (e.g.,  the  LFU  policy)  or  a  combination 
thereof  (e.g.,  the  hybrid  LRFU  policy).  The  numerous 
policies  which  have  been  proposed  are  often  ad-hoc  at¬ 
tempts  to  take  advantage  of  the  statistical  information  con¬ 
tained  in  the  stream  of  requests,  and  to  address  the  factors 
above.  Their  performance  is  typically  evaluated  via  trace- 
driven  simulations,  and  compared  to  that  of  other  well- 
established  policies. 

As  should  be  clear  from  the  discussion  above,  the  clas¬ 
sical  set-up  used  in  [1]  and  [4]  is  too  restrictive  to  cap¬ 
ture  the  salient  features  present  in  Web  caching.  Indeed, 
the  Independent  Reference  Model  fails  to  capture  both 
popularity  (i.e.,  long-term  frequencies  of  requested  doc¬ 
uments)  and  temporal  locality  (i.e.,  correlations  among 
document  requests).  It  also  does  not  account  for  docu¬ 
ments  with  variable  sizes.  Moreover,  this  literature  im¬ 
plicitly  assumes  that  document  replacement  is  mandatory 
upon  a  cache  miss,  i.e.,  a  requested  document  not  found 
in  cache  must  be  put  in  the  cache.  While  this  requirement 
is  understandable  when  managing  computer  memory,  it  is 
not  as  crucial  when  considering  web  caches, 1  especially 
if  this  approach  results  in  simple  document  replacement 
policies  with  good  performance. 

With  these  difficulties  in  mind  it  seems  natural  to  seek 
to  extend  these  provably  optimal  caching  policies  in  sev¬ 
eral  directions:  (i)  The  documents  have  non-uniform  costs 
(as  we  assimilate  cost  to  size  and  variable  retrieval  la¬ 
tency),  (ii)  there  exist  correlations  in  the  request  streams, 
and  (iii)  document  placement  and  replacement  are  op¬ 
tional  upon  a  cache  miss. 

In  this  paper,  we  take  an  initial  step  in  the  directions  (i) 
and  (iii):  While  still  retaining  the  Independent  Reference 
Model,  we  consider  the  problem  of  finding  an  optimal 
replacement  policy  with  non- uniform  retrieval  cost  c(j) 
( j  =  1 , ,N)  under  the  option  that  a  requested  docu¬ 
ment  not  in  cache  is  not  necessarily  put  in  cache  after  be¬ 
ing  retrieved  from  the  server.  Interestingly  enough,  this 
simple  change  in  operational  constraints  allows  us  to  de¬ 
termine  completely  the  structure  of  the  optimal  replace¬ 
ment  policy  for  the  minimum  average  cost  criterion  (over 
both  finite  and  infinite  horizons).  Making  use  of  stan¬ 
dard  ideas  from  the  theory  of  Markov  Decision  Processes 
(MDPs),  we  show  [Theorem  1]  that  the  optimal  policy  is 

1In  web  caching  timescales  are  slower  than  in  conventional  caching 
due  to  variable  network  latencies. 
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the  (pure)  Markov  stationary  policy  Go  that  prescribes 

Evict  doc(i )  if  (3) 

i  =  arg  min(p(j)c(j)  :  doc(j)  in  cache  or  request) 

The  simplicity  of  this  optimal  replacement  policy  should 
be  contrasted  with  the  state  of  affairs  in  the  traditional 
formulation  when  replacement  is  mandatory.  Indeed,  in 
the  latter  case,  except  for  the  optimality  of  Aq  for  uni¬ 
form  cache  objects,  there  arc  no  known  results  concerning 
the  structure  of  the  optimal  policy  for  an  arbitrary  (thus 
non-uniform)  cost  structure  (to  the  best  of  the  authors’ 
knowledge).  It  is  tempting,  yet  erroneous,  to  conclude 
that  the  simple  stationary  Markov  replacement  policy  that 
prescribes 

Evict  doc(i)  if  (4) 

%  =  arg  min  (p(j)c(j)  :  doc(j)  in  cache) 

is  optimal;  this  policy  is  “myopically”  optimal  but  usually 
not  optimal  as  simple  examples  show.  Curiously,  the  pol¬ 
icy  (4)  is  reminiscent  of,  and  similar  to,  the  policy  Co  as 
given  in  (3). 

The  ability  to  find  provably  optimal  policies  under  an 
arbitrary  cost  structure  can  be  put  to  advantage  in  the  fol¬ 
lowing  way:  As  in  most  complex  engineering  systems, 
multiple  performance  metrics  need  to  be  considered  when 
operating  caches,  sometimes  leading  to  conflicting  objec¬ 
tives.  For  instance,  managing  the  cache  to  achieve  as 
small  a  miss  rate  as  possible  does  not  necessarily  ensure 
that  the  average  latency  of  retrieved  documents  is  as  small 
as  could  be  since  the  latter  performance  metric  typically 
depends  on  the  size  on  retrieved  documents  while  the  for¬ 
mer  does  not.  One  possible  approach  to  capture  the  multi¬ 
criteria  aspects  is  to  introduce  constraints.  In  the  second 
part  of  the  paper  we  formulate  the  problem  of  finding  a 
constrained  replacement  policy  that  minimizes  an  average 
cost  under  a  single  constraint  in  terms  of  a  long-run  av¬ 
erage  metric.  Using  the  developments  indicated  above  we 
arc  able  to  identify  the  structure  of  the  constrained  optimal 
policy  as  a  randomized  Markov  stationary  policy  obtained 
by  randomizing  two  simple  policies  of  the  type  (3).  The 
analysis  relies  on  a  simplified  version  of  a  methodology 
developed  in  the  context  of  MDPs  with  a  constraint  in  [2], 

The  paper  is  organized  as  follows:  The  search  for  op¬ 
timal  replacement  policies  with  optional  eviction  is  for¬ 
mulated  as  a  Markov  decision  process  in  Section  II.  Its 
solution  is  discussed  in  Section  III  and  Section  IV  is  de¬ 
voted  to  the  constrained  problem. 


II.  Finding  Good  Replacement  Policies 

One  approach  for  designing  good  replacement  policies 
is  to  couch  the  problem  as  one  of  sequential  decision  mak¬ 
ing  in  the  presence  of  randomness.  The  analysis  that  pro¬ 
duced  the  policy  Aq  described  earlier  (and  its  optimality 
under  the  Independent  Reference  Model)  is  one  based  on 
Dynamic  Programming  as  developed  in  the  framework  of 
MDPs  [5]  [11]. 

A.  An  MDP  framework 

The  system  is  composed  of  a  server  where  a  copy  of 
each  of  its  N  documents  is  available,  and  of  a  cache  of 
size  M  with  1  <  M  <  N .  Documents  are  first  requested 
at  the  cache:  If  the  requested  document  has  a  copy  already 
in  cache  (i.e.,  a  hit),  this  copy  is  downloaded  by  the  user  at 
some  cost  (e.g.,  latency).  If  the  requested  document  is  not 
in  cache  (i.e.,  a  miss),  a  copy  is  requested  from  the  server 
to  be  put  in  the  cache.  If  the  cache  is  already  full,  then 
a  decision  needs  to  be  taken  as  to  whether  a  document 
already  in  cache  will  be  evicted  (to  make  place  for  the 
copy  of  document  just  requested)  and  if  so,  which  one. 
In  principle  this  decision  is  taken  on  the  basis  of  earlier 
decisions  and  past  requests,  and  seeks  to  minimize  a  cost 
function  associated  with  the  operation  of  the  cache  over 
either  a  finite  horizon  or  an  infinite  horizon. 

Decision  epochs  are  defined  as  the  instants  at  which 
requests  for  documents  are  presented  at  the  cache,  and 
are  indexed  by  t  =  0,1,....  At  time  t  =  0, 1, . . .,  let 
St  denote  the  state  of  the  cache,  thus  St.  is  a  subset  of 
{1, . . . ,  N}  with  size  [St\  <  M.  Let  Sm  denote  the  col¬ 
lection  of  all  subsets  of  {1, ... ,  N}  of  size  less  or  equal  to 
M. 

Let  { Rt ,  t  =  0, 1 , . . . }  denote  the  sequence  of  docu¬ 
ment  requests,  with  Rt  an  {].....  V}- valued  random 
variable  (rv).  When  the  request  7?/  is  made,  the  state  of 
the  cache  is  5)  and  let  Ut  denote  the  action  prompted  by 
the  request  Rt.  If  the  request  Rt  is  already  in  cache,  then 
we  use  the  convention  Ut  =  0  to  denote  the  fact  that  no  re¬ 
placement  decision  needs  to  be  taken.  On  the  other  hand, 
if  the  request  Rt  is  not  in  the  cache,  then  Ut  takes  value  in 
St  +  Rt  and  identifies  the  document  to  be  removed:  If  Ut 
is  selected  in  ,S),  then  an  eviction  takes  place  with  the  doc¬ 
ument  Ut  removed  from  the  cache  and  replaced  by  7?/.  On 
the  other  hand  if  (7/  =  /?/.  then  no  document  is  replaced. 
Thus,  the  resulting  cache  state  5)+i  (just  before  the  next 
request  Rt+\  is  made)  is  given  by  2 

Sm  =  T(St,Rt,Ut) 

2  Throughout,  for  any  subset  S  of  N}  and  any  elements  x 

and  u  in  {1, . . . ,  Ar},  we  write  S  +  x  —  u  to  denote  the  subset  of 
{1, . . .  ,  N}  obtained  from  S  by  adding  x  to  it  and  removing  u  from 
the  resulting  set,  in  that  order. 
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St 

if  Rt  e  S, 

St  +  Rt 

if  Rt  fL  St, 

l$| 

<  M 

{  St  +  Rt-Ut 

if  Rt  0  St, 

\St\ 

=  M 

In  this  formulation  we  note  that  a  document  is  not  neces¬ 
sarily  evicted  from  the  cache  if  the  requested  document  is 
not  in  cache  and  the  cache  is  full.  Under  these  rules  of 
operation,  we  note  that  eventually  the  cache  will  become 
full  at  some  time  and  will  remain  so  from  that  time  on¬ 
ward,  i.e.,  given  any  initial  cache  5o,  there  exists  r  finite 
such  that  \Sr+t\  =  M  for  all  t  =  0,1,.... 

The  state  variable  at  time  t  =  0, 1, . . .  is  the  pair 
(St,  Rt).  The  state  space  is  then  the  set  X  given  by 

X  :=SM  x{l,...,N}. 

The  information  available  to  make  a  decision  when  the 
document  Rt  (t  =  0, 1, . . .)  is  requested  is  encapsulated 
in  the  random  variable  Ht  defined  recursively  by 

Ht+ 1  =  (Ht,  Ut,  St.+u.Rt.+i),  t  =  0, 1, . . . 

with  Ho  =  (S0,Ro).  Thus,  the  range  Hi  of  H)  can  be 
defined  recursively  by 

Rt+i  =Ut  x  {0,...,N}  x  X  t  =  0,1,... 

with  "Ho  =  X.  The  decision  Ut  implemented  in  response 
to  request  Rt  is  then 

Ut  =  n(Ht) 

for  some  mapping  7q  :  Rt  — *  {0, 1, . . .  ,  N}.  The  collec¬ 
tion  7r  =  ( 7T/ .  t  =  0,1,...)  defines  the  replacement  (or 
evicition)  policy  tt.  Sometimes  it  is  useful  to  consider  ran¬ 
domized  policies  which  arc  now  defined:  A  randomized 
replacement  policy  ir  is  a  collection  (tt/.  t  =  0, 1, . . .)  of 
mappings  7iy  :  {0, 1, . . . ,  N}  x  Rt  — >  [0, 1]  such  that  for 
all  t  =  0, 1, . . we  have 

N 

Y  = 1 


7 t(u;  Ht)  =  0,  Rt  &  St  and  u  (£  St  +  Rt 

and 

n(u;  Ht)  =  S(u;  0),  Rt  G  St  or  \St\  <  M 

for  all  rt  =  0, 1, . . . ,  N.  The  class  of  all  (possibly  ran¬ 
domized)  replacement  policies  is  denoted  by  V. 

If  the  replacement  policy  n  has  the  property  that 

Ut  =  ft(St,Rt),  t  =  0,1,... 


for  mappings  ft  :  X  — >  {0, 1, . . . ,  N},  we  say  that  7r  is  a 
Markov  policy.  If  in  addition,  ft  =  f  for  allf  =  0, 1, . . . 
the  policy  is  said  to  be  a  (Markov)  stationary  policy,  in 
which  case  the  policy  is  identified  with  the  mapping  / 
itself.  Similar  definitions  can  be  given  for  randomized 
Markov  stationary  policies  [5] . 

Under  the  Independent  Reference  Model,  the  se¬ 
quence  of  requests  is  a  sequence  {Rt,  t  =  0,1,...}  of 
{1, . . . ,  iV) -valued  rv  distributed  according  to  some  pmf 
P  =  (p(l),  •  •  ■  ,P(H))  on  {1, ... ,  N}. 

The  definition  of  the  underlying  MDP  is  completed  by 
associating  with  each  admissible  policy  7r  in  V,  a  proba¬ 
bility  measure  defined  through  the  following  require¬ 
ments:  For  each  t  =  0, 1, . . .,  we  have 

P^lUt  =  u\Ht]  =  Trt(u-,Ht),  u  =  0, . . . ,  N  (5) 

and 

P,  [St+1  =  S',Rt+1=y\Ht,Ut]  (6) 

=  p(y)P7T[St+i  =  S,\Ht,Ut] 

=  p(y)l[T(SuRt,Ut)  =  S'] 

for  every  state  (S' ,  y )  in  X.  Let  E TK  denote  the  expectation 
operator  associated  with  the  probability  measure  Pw. 

B.  The  cost  functionals 

With  any  one-step  cost  function  c:{l,...,(V}— >  1R+, 
we  associate  several  cost  functions:  Fix  a  replacement 
policy  7r  in  V.  For  each  T  =  0, 1, . . define  the  total  cost 
over  the  horizon  [0,  T]  under  the  policy  tt  by 

-  T 

Jc(7r;T)=E,  Y,1  [Rt#st]c(Rt)  • 

.t-  o 

The  average  cost  (over  the  entire  horizon)  under  the  policy 
7T  is  then  defined  by 

Jc(ir)  =  lint  sup  1  Jc(tt;T)  (7) 

T— >oo  1  +  1 

1  r T  1 

=  limsup  E tt  st]  c(Rt) 

T ^  T  +  1  L=o  J 

We  use  the  limsup  operation  in  the  definition  above  since 
under  an  arbitrary  policy  ir  the  limit  in  (7)  may  not  exist; 
this  is  standard  practice  in  the  theory  of  MDPs. 

A  number  of  situations  can  be  handled  by  adequately 
specializing  the  cost-per-step  c:  Indeed,  if  c(y)  =  1 
(y  =  1, . . . ,  N),  then  Jc(tt;T)  and  Jc(ir)  are  the  ex¬ 
pected  number  of  cache  misses  over  the  horizon  [0,  T] 
and  the  average  miss  rate  under  policy  n,  respectively. 
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On  the  other  hand,  if  c  is  taken  to  be  the  size  func¬ 
tion  s  :  {1, . . . ,  N}  — >  IN,  with  s(y)  denoting  the  size  (in 
bytes)  of  doc(y)  (y  =  1, . . . ,  N),  then  the  byte  hit  rate 
under  policy  7r  can  be  defined  by 


A.  The  optimal  replacement  policy 

For  each  T  =  0, 1, . . we  define  the  cost-to-go  asso¬ 
ciated  with  the  policy  tt  in  V  starting  in  the  initial  state 
(S,  r)  in  X  to  be 


BHR(ir)  =  liminf 

T— >oo 


En 

Elc 

,  l  [Rt  e  St]  < 

<Rt\ 

E-7r 

E 

(B) 


where  the  liminf  operation  reflects  the  fact  that  this  perfor¬ 
mance  is  maximized.  To  make  use  of  the  MDP  framework 
used  here,  we  first  note  that 


Jc  ((S,  r);T) 


:=  E„ 


E1^  ?St]c(Rt)\S0 


i=0 


S,R0  =  r 


Next,  the  value  function  over  the  horizon  [0,  T]  is  defined 
by 


r  t 


lim  — 

T^oo  T  +  1 


-E 


i=0 


=  E  [s(R)} 


for  some  {1, . . . ,  iV}-valued  rv  R  with  pmf  p.  Next,  we 
see  that 


BHR(n) 


=  1  —  lim  sup 

T— >oo 


=  1  —  lim  sup 

T— TOO 


Eyr 

Ef=0 

1  [Rt  0  St]  t 

s(Rt) 

E-fl- 

EEo  s(Rt) 

1 

eL 

o  i  [Rt  0  st] 

s(Rt) 

1  E 

T+l 

EEo  -<Rt\ 

=  1  - 


JsM 
E  HR)}' 


VT(S,r)  :=  inf  J-((S,r);T),  (S,r)  G  X . 

7 rEP 

Recall  that  regardless  of  the  initial  condition,  the  cache 
will  eventually  become  and  remain  full.  Thus,  under  the 
average  cost  criterion  used  here,  there  is  no  loss  of  gener¬ 
ality  in  assuming  the  space  state  to  be  X*  (instead  of  the 
original  X)  with 

A*  :={(£,  r)  EX:  |S|  =  M}. 

For  the  MDP  at  hand,  the  DPE  takes  the  form 

Vr+i (S,r) 

=  l[reS]E[Fr(S,ir)]  (9) 

+  1  [r  0  S]  fc(r)  +  min  E  \Vr(S  +  r  —  u,  i?*)A 
V  ues+r  J 


Hence,  maximizing  the  byte  hit  rate  is  equivalent  to  mini¬ 
mizing  the  average  cost  associated  with  5. 

The  basic  problem  we  address  is  that  of  finding  a  cache 
replacement  policy  tt*  in  V  such  that 

Jc(n*)  <  Join),  TT  G  V. 


for  every  state  (S.  r)  in  X*  with  R*  denoting  an 
{1, . . . ,  iV}-valued  rv  with  pmf  p.  The  possibility  of  non¬ 
eviction  is  reflected  in  the  choice  u  =  r  (obviously  in 
S  +  r).  Moreover,  as  well  known  [11],  the  optimal  action 
to  be  taken  in  state  (5,  r)  at  time  t  =  0  when  minimizing 
the  cost  criterion  over  the  horizon  [0,  T]  is  simply  given 
by 


We  refer  to  any  such  policy  tt*  as  an  optimal  replacement 
policy.  It  is  not  necessarily  unique,  but  in  the  next  section 
we  identify  such  an  optimal  policy  tt*  which  also  happens 
to  be  a  Markov  stationary  policy. 

III.  Non-uniform  cost  optimal  replacement 

POLICY  WITHOUT  MANDATORY  EVICTION 

In  this  section  we  discuss  the  optimal  cache  replace¬ 
ment  policy  for  non  uniform  costs  under  the  Independent 
Reference  Model  when  eviction  is  not  mandated.  A  useful 
characterization  of  the  optimal  policy  for  the  correspond¬ 
ing  MDP  (being  one  with  finite  state  and  action  spaces) 
can  be  initiated  with  the  help  of  the  Dynamic  Program¬ 
ming  Equation  (DPE)  [11], 


g*(S,  r)  :=  arg  min  (E  [VT(S  +  r-u,R *)]) , 

•uEo+r 

say  with  a  lexicographic  tie-braker  for  sake  of  concrete¬ 
ness. 

The  main  result  of  this  section  is  contained  in  the  fol¬ 
lowing  theorem  that  prescribes  the  optimal  replacement 
policy  for  the  caching  problem  at  hand. 

Theorem  1:  For  each  T  =  0, 1, . . .,  we  have  the  identi¬ 
fication 

9t(S,t)  =  g*(S,  r )  (10) 

for  any  state  (S.  r )  in  X*  whenever  r  is  not  in  S,  with 

9*(S,  r )  :=  arg  min  (p(u)c(u))  .  (11) 
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The  proof  of  Theorem  1  is  given  in  Appendix  A.  Note 
that  gf  does  not  depend  on  T,  and  that  the  Markov  station¬ 
ary  policy  associated  with  g,  is  the  policy  C'o  introduced 
earlier.  It  is  now  plain  from  Theorem  1  that  the  Markov 
stationary  policy  Co  is  optimal  for  both  the  finite  and  infi¬ 
nite  horizon  cost  problems. 


B.  Evaluation  of  the  optimal  cost 

In  order  to  calculate  the  average  cost,  byte  hit  rate,  and 
other  interesting  properties  of  the  replacement  policy  of 
Theorem  1,  we  find  it  useful  to  introduce  the  permuta¬ 
tion  a  of  which  orders  the  values  p(i)c(i ) 

(i  =  1 .....  TV)  in  decreasing  order,  namely 

p(cr(l))c(cr(l))  >  p(cr(2))c(cr(2))  >  .  .  .  (12) 


The  key  observation  is  that  the  long  term  usage  of  the  op¬ 
timal  replacement  policy  C'o  results  in  a  set  of  M  fixed 
documents  in  the  cache,  namely  {cr(  1 ) , . . . ,  <t(M)},  so 
that  every  document  in  the  set  (cr(l), . . .  ,  a(M)}  is  never 
evicted  from  the  cache  once  requested.  If  we  write 

.V:  { rr(  I )... . .  .  rr(  .1/ ) }  (13) 

for  this  steady-state  stack,  then  formally 

lim  P c'o  [o-(i)  6  =  0,  i  =  M  +  1, . . .  ,  N 

t—¥  OO 

and 

_  N 

Jd{Co)  =  ^2p{i)d{i)  =  p(cr(*))rf(cr(*))  (14) 

i£S  i=M+ 1 

for  any  cost  d  :  { I , . , -V }  — >  R+  (and  in  particular-  the 
cost  c  :  {1, . . .  j  N}  — >  1R+  which  induces  the  policy  Co). 
Thus,  the  byte  hit  rate  associated  with  the  policy  (f  is 
simply  given  by 


BHR(Cq) 


EjiiP(v(j))s(v(j)) 

E  [s(R)] 


(15) 


Another  interesting  observation  is  the  relation  of  the 
optimal  replacement  policy  C'o  to  the  well-established 
Greedy  Dual*  and  Greedy  Dual-Size  replacement  policies 
described  in  [6]  and  [7].  Let  cgD  :  {1,  •  •  • ,  N}  — >  1R+  be 
an  arbitrary  cost  used  by  the  Greedy  Dual  policies.  The 
Greedy  Dual  policies  under  optional  document  placement 
in  case  of  a  cache  miss  prescribe 


Evict  doc(i) 

if  i  =  are;  min 

jes+r 


(16) 

Ht)') 


where  L  is  a  contribution  of  the  temporal  locality  of  ref¬ 
erence  to  the  replacement  policy  and  (5  >  0  is  a  weight 
factor  that  modulates  the  contribution  of  the  probability  of 
reference,  document  size  and  document  cost  to  the  evic¬ 
tion  decision.  Under  the  Independent  Reference  Model 
used  here,  the  temporal  locality  factor  L  can  be  taken  to 
be  zero,  in  which  case  the  Greedy  Dual  policy  simplifies 
to 

n  •  t.  i  -s  ■  ■  ,pU)cgdU), 

Evict  docn)  it  %  =  arg  mm  - — - 

jes+ry  s(j) 

which  is  a  special  case  of  the  optimal  replacement  policy 
C'o  associated  with  cost  function  c  :  {1, . . . ,  N}  — >  E4. 
given  by 


()  CGD(l) 

[)  s(i)  1  ’•••’ 


C.  Implementing  the  optimal  policy 

A  natural  implementation  of  the  optimal  replacement 
policy  Co  is  achieved  by  invoking  the  Certainty  Equiva¬ 
lence  Principle.  In  addition  to  the  online  estimation  of  the 
probability  of  references  (as  was  the  case  for  (2)),  this  ap¬ 
proach  now  requires  the  estimation  of  additional  param¬ 
eters  which  enter  the  definition  of  the  overall  document 
cost  (c(j),  j  =  1, ,  N),  e.g.,  in  the  case  of  document 
latency,  the  document  size  might  be  fully  known  but  the 
available  bandwidth  to  the  server  needs  to  be  measured 
online  at  request  time.  Let  (c/v(j),  j  =  1, N)  de¬ 
note  an  estimate  of  the  document  costs  which  are  avail¬ 
able  at  the  cache  at  the  time  instance  of  the  Klh  request: 
If  | St |  <  M,  document  placement  always  takes  place; 
otherwise  the  replacement  action  is  dictated  by 

Evict  doc(i)  if  i  =  arg  min  ( pK(j)ci({j )) 

jeSt+Rt 


IV.  Optimal  caching  under  a  constraint 

One  possible  approach  to  capture  the  multi-criteria  as¬ 
pect  of  running  caching  systems  is  to  introduce  con¬ 
straints.  Here,  we  revisit  the  caching  problem  studied  in 
Section  III  under  a  single  constraint. 


A.  Problem  Formulation 

Formulating  the  caching  problem  under  a  sin¬ 
gle  constraint  requires  two  cost  functions,  say 
c,d:  {1 ,N}  — >  1R+.  As  before,  c(Rt)  and  d(Rt) 
represent  different  costs  of  retrieving  the  requested 
document  Rt  if  not  in  the  cache  S)  at  time  t.  For  instance, 
we  could  take 

c{y)  =  1  and  d(y)  =  s(y),  y  =  1, . . . ,  N  (17) 
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to  reflect  interest  in  miss  rate  and  document  retrieval  la¬ 
tency,  respectively. 

The  problem  of  interest  can  now  be  formulated  as  fol¬ 
lows:  Given  some  a  >  0,  we  say  that  the  policy  n  in  V 
satisfies  the  constraint  at  level  a  if 

Jd{n)  <  «•  (18) 

Let  'P(d:  a)  denote  the  class  of  all  cache  replacement  poli¬ 
cies  in  V  that  satisfy  the  constraint  (18)  at  level  a. 

The  problem  is  to  find  a  cache  replacement  policy  if  in 
V(d;  a)  such  that 

Jc( 7T*)  <  Jc{ tt),  VT  G  V{d;  a). 

We  refer  to  any  such  policy  tG  as  a  constrained  optimal 
policy  (at  level  a).  With  the  choice  (17)  this  formulation 
would  focus  on  minimizing  the  miss  rate  with  a  bound  on 
average  latency  of  document  retrieval  (under  the  assump¬ 
tion  that  retrieval  latency  is  proportional  to  the  size  of  the 
document  to  be  retrieved). 

One  natural  approach  to  solving  this  problem  is  to  con¬ 
sider  the  corresponding  Lagrangian  functional  defined  by 

Jx{n)  =  Jc{n)  +  XJd{n),  n  G  V,  A  >  0.  (19) 

The  basic  idea  is  then  to  find  for  each  A  >  0,  a  cache 
replacement  policy  7r*(A)  in  'P  such  that 


B.  A  Lagrangian  approach 

Following  the  treatment  in  [2],  we  now  introduce  an 
alternate  Lagrangian  formulation  which  circumvents  this 
technical  difficulty  and  allows  us  eventually  to  carry  out 
the  program  outlined  above:  For  each  A  >  0,  we  define 
the  one-step  cost  function  b\  :  {1 , . . . ,  iV}  — >  R+  by 

b\{y)  ■■=  c{y)  +  \d{y),  y  =  l,...,N 


and  consider  the  corresponding  long-run  average  func¬ 
tional  (7),  i.e.,  for  any  policy  n  in  V ,  we  set 


J\M 


Jbx  M 


(21) 


1  ini  sup  — — -E„ 

T-r  oo  1  +  1 


El  [Rt?St\ bx(Rt) 


L=o 


With  these  definitions  we  get 

JbxM<J\M,  it<EV 
by  standard  properties  of  the  limsup,  with  equality 

J6a(tt)  =  JA(vr) 

whenever  7r  is  a  Markov  stationary  policy. 

For  each  A  >  0,  the  (unconstrained)  caching  problem 
associated  with  the  cost  b\  is  an  MDP  with  finite  state 
and  action  spaces.  Thus,  there  exists  a  Markov  stationary 
policy,  denoted  g\,  which  is  optimal,  i.e., 


Ja(tt*(A))  <  Ja(tt),  nEV.  (20) 

Now,  if  for  some  A*  >  0,  the  policy  7t*(A*)  happens  to 
saturate  the  constraint  at  level  a,  i.e., 

./,/(7r*(A*))  m 

then,  the  optimality  of  7t*(A*)  implies 

JA*(7r*(A*))  <  JA*(7r),;  t tEV. 

In  particular,  for  any  policy  tt  in  'P(<7:  o-j.  this  last  inequal¬ 
ity  readily  leads  to 

Jc{ vr*(A*))  <  Jc{ 7r),  7 r  G  V{d;a), 

and  the  policy  7r*(A*)  solves  the  constrained  optimization 
problem. 

The  only  glitch  in  this  approach  resides  in  the  use  of  the 
limsup  operation  in  the  definition  (7),  so  that  JA(  7r)  is  not 
necessarily  the  long-run  average  cost  under  policy  7r  for 
some  appropriate  one-step  cost.  Thus,  finding  the  optimal 
cache  replacement  policy  7t*(A)  specified  by  (20)  cannot 
be  achieved  in  a  straightforward  manner. 


7r  g  V 

and  earlier  remarks  yield 

<  J\W,  tveV. 

In  other  words,  the  Markov  stationary  policy  g\  also  min¬ 
imizes  the  Lagrangian  functional  (19),  and  the  relation 

J\(g\)  =  inf  Ja(tt)  =  inf  Ja(tt)  (22) 

7TGP  7TGP 

holds.  Consequently,  as  argued  in  Section  IV-A,  if  for 
some  A*  >  0,  the  policy  g\*  saturates  the  constraint  at 
level  a,  then  the  policy  gy  will  solve  the  constrained  op¬ 
timization  problem. 

The  difficulty  of  course  is  that  a  priori  we  may  have 
J\{g\)  /  a  for  all  A  >  0.  However,  the  arguments  given 
above  still  show  that  the  search  for  the  constrained  optimal 
policy  can  be  recast  as  the  problem  of  finding  7  >  0  and 
a  Markov  stationary  policy  g*  such  that 

Jd(g *)  =  «  (23) 

and 

J-y(g*)  <  ttEV.  (24) 
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C.  On  the  way  to  solving  the  constrained  MDP 

The  appropriate  multiplier  7  and  the  policy  <f  appear¬ 
ing  in  (23)  and  (24)  will  be  identified  in  Section  IV-D.  To 
help  us  in  this  process  we  need  some  technical  facts  and 
notation  which  we  now  develop. 

Theorem  2:  The  optimal  cost  function  A  — >  J\  (g\ )  is  a 
non-decreasing  concave  function  which  is  piecewise  lin¬ 
ear  on  1R+ . 

Some  observations  arc  in  order  before  giving  a  proof  of 
Theorem  2:  Fix  A  >  0.  In  view  of  Theorem  1  we  can 
select  g\  as  the  policy  Cq  induced  by  b\,  i.e., 

Evict  doc(i)  iff  i  =  arg  min  ( c(j )  +  A d(j))  .  (25) 

jeS+r 

Let  o\  denote  the  permutation  of  {1, . . . ,  N  }■  which  or¬ 
ders  the  values  p(i)b\(i)  (i  =  1, . . . ,  N)  in  decreasing 
order,  namely 

p(oa(1))Moa(1))  >  P(crx(2))bx(crx(2))  >  ...  (26) 

with  a  lexicographic  tie -breaker.  Let  5(A)  denote  the 
steady-state  stack  induced  by  the  policy  g\,  namely  the 
collection  of  documents  in  the  cache  that  results  from 
long-term  usage  of  the  policy  g\ .  Obviously,  we  have  3 

5(A)  =  {<7A(1),...,.<ta(M)}  (27) 

so  that 

=  Jbx(g\)  =  p(*)M*)  (28) 

i$S(  A) 

upon  rephrasing  comments  made  earlier  in  Section  III. 

Given  the  affine  nature  (in  the  variable  A)  of  the 
cost,  there  must  exist  a  finite  and  strictly  increasing  se¬ 
quence  of  non-zero  scalar  values  Ai, . . . ,  Al  in  1R+  with 
0  <  Ai  <  . . .  <  Al  such  that  for  each  t  =  0, . . . ,  L,  it 
holds  that 

5(A)  =  5(Af),  A  (E  If  :=  [A^,  A^+i) 
with  the  convention  Ao  =  0  and  Al+i  =  00,  but  with 
5(A,)/5(Am),  e  =  0,...,L-l. 

In  view  of  (28)  it  is  plain  that 

J\(g\)  =  p(*)M*)  (29) 

i£S( \t) 

whenever  A  belongs  to  If  for  some  =  0. . . . .  L. 

3The  steady-state  stack  S  given  by  (13)  corresponds  to  the  case  A  = 
0  with  a  =  <to- 


Proof.  For  each  policy  n  in  V,  the  quantities  Jc(7r)  and 
Jd{ 7r)  arc  non-negative  as  the  one-step  cost  functions 
c  and  d  arc  assumed  non- negative.  Thus,  the  mapping 
A  — >  JA(7r)  is  non-decreasing  and  affine,  and  we  conclude 
from  (22)  that  the  mapping  A  —>  J\(g\)  is  indeed  non¬ 
decreasing  and  concave.  Its  picccwisc-l incar  character  is 
a  straightforward  consequence  of  (29).  ■ 

In  order  to  proceed  we  now  make  the  following  simpli¬ 
fying  assumption. 

(A)  If  for  some  A  >  0,  it  holds  that 

p{i)b  x(i)  =  p{j)b\  (j) 

for  some  distinct  i,j  =  1, ...  ,N,  then  there  does  not 
exist  any  k  7^  i,j  with  k  =  1, . . .  ,  N  such  that 

P{i)b\{i)  =  p{j)b\{j)  =  p{k)b\{k). 

Assumption  (A)  can  be  removed  at  the  cost  of  a  more 
delicate  analysis  without  affecting  the  essence  of  the  opti¬ 
mality  result  to  be  derived  shortly. 

For  each  l  =  0, 1, . . . ,  L,  the  relative  position  of  the 
quantities  p(i)b\(i)  (i  =  1, . . . ,  N)  remains  unchanged 
as  A  sweeps  through  the  interval  (A^,  A^+i).  Under  (A), 
when  going  through  A  =  A71 ,  a  single  reversal  occurs  in 
the  relative  position  with 

S{  Af)  =  |crA<(l),...  ,  <J\t  (M  —  1),  <J\(  (M)} 

and 

5(Af+i)  =  {o\t  (1), . . .  ,cta<(M  -  l),aXt(M  +  1)} 

By  continuity  we  must  have 

P(a\e  iM))b\(+1  (o'Xf  (M)) 

=  P(v\((M +  1»b\t+1(<7\t(M +  1)).  (30) 

Theorem  3:  Under  Assumption  (A),  the  mapping 
A  — >  Jd{g\)  is  a  non-increasing  piecewise  constant  func¬ 
tion  on  1R+ . 

Proof.  The  analog  of  (29)  holds  in  the  form 

Jd(gx)  =  (31) 

whenever  A  belongs  to  If  for  some  t  0. . .  /..  Hence, 

the  mapping  A  — >  Jd  (g\ )  is  piecewise  constant. 

Now  pick  l  =  0, 1, . . .  ,  L  —  1  and  consider  A  and  p, 
in  the  open  intervals  (Af,  Af+i)  and  (Af+i,  Af+2)>  respec¬ 
tively.  The  desired  monotonicty  will  be  established  if  we 


can  show  that  Jd{gti )  —  Jdid x)  <  0-  First,  from  (31),  we 
note 

Jd(9n)  ~  Jd{g\)  (32) 

=  55  p(*)rf(*)  -  Y  p(*)rf(*) 

ieS(A)  ies(/i) 

=  p(a\t  {M))d(a\(  (M)) 

-p{aXt(M  +  l))dK(M  +  1)) 

by  comments  made  earlier  as  we  recall  that  5(  A)  =  S(Xt) 
and  5(/i)  =  S(Xi+1). 

Next,  pick  e  >  o  such  that  A  +  e  and  //  +  e  arc  in  the 
open  intervals  (A^A^+i)  and  (A^+i,A^+2),  respectively. 
By  (29)  we  get  S' (A  +  e)  =  5(A)  and 

Jx+eigx+s)  —  J\{g\) 

=  Y  p(*)&a+£w  -  55  p(*)M*) 

i^S(A+£)  /V.V(A) 

=  51  ?(*)&>+<=(*)  -  Y  p(*)M*) 

igS(A)  igS(A) 

=  e  55  p(i)'l(i)  (33) 

igS(A) 

Similarly, 

•J/i+eto/i+e)  -  =  e  55  p(*)rf(*)-  (34) 

'Y-S't/i) 

By  Theorem  2,  the  mapping  A  — >  J\{g\)  is  concave, 
hence 

Jn+eiPn+e)  ~  Ju(g»)  —  J\+e(g\+e)  ~  Jxigx)- 

Making  use  of  (33)  and  (34)  in  this  inequality,  we  readily 
conclude  that 

55  p(*)rfoo  <  55  (35) 

ies(x)  ies(n) 

But  5(A)  =  5( X()  and  5(p)  =  5(A^+i),  whence  (35)  is 
equivalent  to 

p{oxt(M))d{ox({M))  <p(oXt(M  +  l))d(oXl(M  +  1)). 

The  desired  conclusion  .(/ (g^ )  —  Jd{gX)  <  0  is  now 
immediate  from  (32).  ■ 

I).  The  constrained  optimal  replacement  policy 

We  arc  now  ready  to  discuss  the  form  of  the  optimal 
replacement  policy  for  the  constrained  caching  problem. 
Throughout  we  assume  Assumption  (A)  to  hold.  Several 
cases  need  to  be  considered: 


Case  1  -  The  unconstrained  optimal  replacement  policy 
go  satisfies  the  constraint,  i.e.,  Jd{go)  <  cr,  in  which  case 
fT  is  simply  the  optimal  replacement  policy  Co  for  the 
unconstrained  caching  problem.  This  case  is  trivial  and 
requires  no  proof  since  by  Theorem  1  the  average  cost  is 
minimized  and  the  constraint  satisfied. 

Case  2  -  The  unconstrained  optimal  replacement  pol¬ 
icy  does  not  satisfy  the  constraint,  i.e.,  Jd{go)  >  but 
there  exists  A  >  0  such  that  Jd{gX)  <  a.  Two  subcases 
of  interest  emerge  and  arc  presented  in  Theorems  4  and  5 
below. 

Case  2a  -  The  situation  when  the  policy  gx  above  sat¬ 
urates  the  constraint  at  level  a  was  covered  earlier  in  the 
discussion;  its  proof  is  therefore  omitted. 

Theorem  4:  If  there  exists  A  >  0  such  that  Jd{gX)  =  ot, 
then  the  policy  gx  can  be  taken  to  be  the  optimal  replace¬ 
ment  policy  g*  for  the  constrained  caching  problem  (and 
the  constraint  is  saturated). 

Case  2b  -  The  case  of  greater  interest  arises  when  the 
conditions  of  Theorem  4  are  not  met,  i.e.,  Jd(go)  > 

Jd  (fp, )  C  °  f°r  all  T  >  0  but  there  exists  A  >  0  such  that 
Jd  {gx )  <  a.  In  that  case,  by  the  monotonicity  result  of 
Theorem  3,  the  quantity 

7  :=  inf  {A  >  0  :  Jd{gx)  <  «} 

is  a  well  defined  scalar  in  (0,  oo).  In  fact,  we  have  the 
identification 

7  =  Am 

for  some  t  —  0, 1, . . . ,  L  —  1,  and  it  holds  that 

Jd{g\t+i )  <  «  <  Jd{gx()  (36) 

For  each  p  in  the  interval  [0, 1],  define  the  Markov  sta¬ 
tionary  policy  fp  obtained  by  randomizing  the  policies  gX( 
and  gXl+l  with  bias  p.  Thus,  the  randomized  policy  fp 
prescribes 

Evict  doc(i)  if 

argmin jeS+r(p{i)bXt(i))  w.p.  p 
argmin jeS+r{p{i)bX(+1{i))  w.p.  1  -p 

(37) 

Theorem  5:  The  optimal  cache  replacement  policy  i f 
for  the  constrained  caching  problem  is  any  randomized 
policy  fp  of  the  form  (31)  withp  determined  through  the 
saturation  equation 

Jdifp)  =  a.  (38) 


Proof.  For  the  most  paid  we  follow  the  arguments  of 
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[2]:  Pick  A  and  /t  in  the  open  intervals  (Xf  .  A/+| )  and 
(A(j+i,  A^2 )  i  respectively,  in  which  case 

9X  =  gx(  and  gfl  =  gX(+1 


with 

Jdig^i)  <a<Jd(gx).  (39) 

Thus,  as  in  the  proof  of  Theorem  4.4  in  [2],  let  A  and  p  go 
to  Ar+i  monotonically  under  their  respective  constraints. 
The  resulting  limiting  policies  _g  and  g  (in  the  notation  of 
[2]),  are  simply  given  here  by 

g  =  g\t+i  and  g  =  gX( 


where  r(p)  represents  the  asymptotic  fraction  of  time  that 
the  cache  contains  the  document  oX((M).  It  is  a  simple 
matter  to  check  that 


r{p)  ■= 


_ P-p{v\({M)) _ 

p-p{o  X({M))  +  (1  -p)  •  p{crX(  (M  +  1)) 


Case  3  -  Finally,  assume  that 


Jd{g\)  >  A  >  0. 


This  situation  is  of  limited  interest  as  we  now  argue:  Fix 
A  >  0.  For  each  policy  ir  in  V ,  we  can  use  the  optimality 
of  gx  to  write 


with  4 

J"/(fp )  =  J^{g\t+i)  =  J^igxt)  (40) 

for  every  p  in  the  interval  [0,1],  and  optimality 

J'y(fp)  <  TT<EV 

follows.  Moreover,  the  mapping  p  — >  Jd(fp)  being  con¬ 
tinuous  [10],  with 


«  <  A  1Jc{g\)  +  Jd{g\)  <  A  1  Jc(tt)  +  Jd(n). 
Thus,  letting  A  go  to  infinity,  we  conclude  to 
a.<Jd{g r),  tt  e  V. 

The  constrained  caching  problem  has  no  feasible  solution 
unless  there  exists  a  policy  that  saturates  the  constraint. 
Typically,  the  inequality  above  will  be  strict. 


Jd(fp)p= o  —  Jd{g\(+i)  and  Jdifp)p= l  —  Jd(gxt), 
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there  exists  at  least  one  value  p  in  (0, 1)  such  that  (38) 
holds.  The  proof  of  optimality  is  now  complete  in  view 
of  comments  made  at  the  beginning  of  Section  IV-C.  ■ 


It  is  possible  to  give  a  somewhat  explicit  expression  for 
Jdifp)  using  p  in  [0, 1]:  Indeed,  set 


5*  :=  S(A*)nS(A*+1) 

=  {<Tv(l),  ■  •  •  ,cfXi{M  -  1)}. 


Then,  we  have 
JdUp)  =  E[rf(i2)] 
—  lim 


T— >oo  T  +  1 


-E, 


E  1  [Kt  e  St]  d(Rt ) 

Lt=0 


with 


lim  — 

T^oo  T  +  1 


-E 


fp 


EMflte  St]  d(Rt) 


,t= o 


=  E  P(*)d(*)  +  r{p)p(aXt(M))d{aXe{M)) 

ies* 

+  (1  -  r{p))p{aX((M  +  l))d(oXt(M  +  1)) 


4  See  details  in  the  proof  of  Theorem  4.4  in  [2]. 
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A.  A  proof  of  Theorem  1 

The  optimality  of  the  Markov  stationary  policy  Q  for 
both  finite  and  infinite  horizon  cost  problems  is  a  direct 
consequence  of  the  following  fact: 

Proposition  1:  For  each  T  =  0, 1, . . .,  it  holds  that 

arg  min  {u  £  S  +  r  :  E  [Vr(£'  +  r  —  u,  i?*)]} 

=  arg  min  {u  £  S  +  r  :  p(u)c(u)}  (41) 

for  any  state  (S.  r )  in  X*  whenever  r  is  not  in  S. 

The  equality  (41)  is  understood  to  mean  that  if 
v  :=  arg  min{?t  £  S  +  r  :  p(u)c(u)},  then 

E  [VT{S  +  r-v,R*)}=  min  E  [VT{S  +  r  -  u,  R*)}  . 

uES+r 

This  statement  is  weaker  than  the  monotonicity  statement 

p(u)c(u)  >  p(w)c(w),  if 

E  [VT(S  +  r-u,R*)] 

>  E  [Vt{S  +  r  —  w,  R*)] ,  u,wPS  +  r. 

which  is  in  the  optimality  proof  in  [1],  [4], 

The  proof  of  Proposition  1  proceeds  by  induction  on 

T  =  0,1,.... 

The  basis  step  -  Fix  (S,  r)  in  X*  and  note  that 

Vo  {S,r)  =  1  [r  0  S]c{r). 

Thus,  for  distinct  u  and  v  in  S  +  r,  we  have 

E[V0  {S  +  r-^R*)} 

=  E[1  [R*  £  S  +  r  -u]c(R*)] 

=  E  [1  [R*  0  S  +  r]  c(i?*)]  +  E  [1  [iT  =  u]  c(R*)] 

with  a  similar  expression  for  E  [V'o(,S'  +  r  —  v.  ./?*)]. 
Flence, 

E  [V0{S  +  r  -  u,  R *)]  -  E  [Vo (5  +  r  -  v,  R*)] 

=  p(u)c(u)  —  p(v)c(v) 

and  (41)  does  hold  for  T  =  0. 

The  induction  step  -  Assume  (41)  to  hold  for  some 
T  =  0, 1, . . ..  Fix  (S,  r )  in  X *  with  r  not  in  S.  We  need 
to  show  that  for  u  in  S  +  r,  we  have 


if 

v  =  arg  min  { j  £  S  +  r  :  p(j)c(j)}  .  (43) 

Fix  u  in  S  +  r  and  let  IP*  denote  an  rv  distributed  like 
R*  and  independent  of  it.  Using  the  DPE  (9)  we  can  write 

E[Ur+i(S  +  r-tz,in]  (44) 

=  P[i2*G5  +  r-u]E[FT(5  +  r-«,ir*)] 
+E[1  [R*  0  S  +  r-u]c(R*)} 

+E  [l  {RP  (£S  +  r-  u\  VT{S  +  r-u,  i?*)] 

with 

Vt{S,  x)  :=  min  E  \Vt{S  +  x  —  u1,  R**)] 

u’eS+x 

for  every  set  S  with  \S\  =  M  and  x  not  in  S. 

Note  that 

P  [R*  £  S  +  r  —  u]  E  [Vr(S  +  r-u ,  R**)\ 

=  P  [R*  PS  +  r-  (?t,  v)]  E  [VT{S  +  r  —  u,  R?*)] 
+p{v)V[VT{S  +  r-u,R !**)]  (45) 

and  that 

E  [1  [R*  0  S  +  r  -  u]  c{R*)] 

=  E[1  [R*  (£S  +  r]c(R*)]+p(u)c{u)  (46) 

with  v  as  defined  by  (43).  Finally, 

E  [l  [R*  gS  +  r-  u]  VT{S  +  r-u,  i?*)] 

=  E[l  [R*  S  +  r]VT{S  +  r  -  u,  R*)] 

+p(u)Vt{S  +  r  —  u,u)  (47) 

Reporting  (45),  (46)  and  (47)  into  (44),  we  conclude  that 
E  [VT+i(S  +  r-u,I?)] 

=  P  [iT  £  5  +  r  -  («, v)]  E  [Ur(N  +  r-u,  R**)] 
+E  [1  [R*  (jL  S  +  r]  c{R *)]  +  p(u)c(u) 
+p(u)Vr(S  +  r  —  u,  u) 

+p(v)V[VT(S  +  r-u,R !**)] 

+E  [l  [if*  0  S  +  r]  VT{S  +  r-  u,R*)]  (48) 

We  can  now  write  the  corresponding  expression  (48) 
with  u  replaced  by  v,  and  the  difference  in  (42)  takes  the 
form 

E  [VT+i{S  +  r  —  u,  R*)  -  VT+i {S  +  r-v,  R*)} 
=  (p(u)c(u)  -  p(v)c(v)) 

+P  [R*  £  S  +  r-  («,«)]  Ai 
+p{u)A  2  +p{v)  A3  +  A4 


E  [VT+i{S  +  r-u ,  R !*)  -  VT+i(S  +  r-v,  i?*)]  >  0 

(42) 
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(49) 


with 


A-i  :=  E[VT(S  +  r  -u,R**)] 

-E  [VT(S  +  r  -v,R**)] 

A2  :=  VT(S  ■  r  a.  u)  -  E  [VT{S  +  r-v,  />’“)] 

A3  :=  E  [VT(S  +  r-u,  R**)]  -  VT(S  +  r-v,v) 
and 

A4  :=  E  [l  [It  0  5  +  r]  Vr(S  +  r  —  u,  i?*)J 
-E  [l  [It  0  S  +  r]  Ft(5  +  r  -  v,  i?*)] 

Observe  that  p(u)c(u)  —p(v)c(v)  >  0  by  the  definition 
of  v  and  that  the  condition  A]  >  0,  being  equivalent  to 
(41),  holds  true  under  the  induction  hypothesis.  Next,  we 
note  that 

Vt(S  +  r  —  u,  u) 

=  min  E  [Vt{S  +  r  —  u1 ,  R**)] 

u'ES+r 

=  E  \yT(S  +  r-v,It*)] 

by  the  induction  hypothesis  and  the  definition  of  v,  so  that 
A2  =  0.  Similarly, 

Vt(S  +  r  —  v,  v)  =  min  E  [Vr(S  +  r  —  v' ,  i?**)] 

v'eS+r 

whence  A3  >  0  again  by  the  induction  hypothesis  and  the 
definition  of  v.  Consequently, 

E  [VT+i {S  +  r  —  u,  It)  -  VT+i (S  +  r-v,  R*)]  >  A4 

and  (42)-(43)  will  hold  if  we  can  show  that  A4  >  0. 
Inspection  of  A4  reveals  that  A4  >  0  provided 

Vt{S  +  r  —  u,  x)  —  Vt{S  +  r  —  v,  x)  >0  (50) 

whenever  x  is  not  in  S  +  r. 

To  establish  (50)  we  find  it  useful  to  order  the  set  of 
documents  {1, . . . ,  M}  according  to  their  expected  cost: 
For  u  and  v  in  {  I .... .  M}  we  write  u  <  v  (resp.  u  <  v) 
if  p(u)c(u)  <  p(v)c{v)  (resp.  p(u)c(u)  <  p(v)c(v)), 
with  equality  u  =  v  if  p(u)c(u)  =  p(v)c(v).  We  can  now 
interpret  v  as  the  smallest  element  in  S  +  r  according  to 
this  order.  Two  cases  emerge  depending  on  whether  v  <  x 
or  x  <  v: 

Case  1  -  Assume  x  <  v.  Then,  for  u  ^  v  in  S  +  r,  we 
have 

Vt{S  +  r  —  u,  x)  (51) 

=  min  E  [Vr{S  +  r  —  u  +  x  —  u1 ,  R**)] 

u1  E.S+r—u+x 


Note  that  x  is  not  in  S  +  r  —  u  and  that  x  is  smallest  in 
S  +  r  +  x  (thus  in  S  +  x  —  u  which  contains  it).  By  the 
induction  hypothesis  (applied  to  the  state  (S  +  r  —  u,  x)) 
we  can  conclude  that  the  minimization  above  is  achieved 
at  u'  =  x,  so  that 

VT(S  +  r  -  u,  x)  =  E  [VT{S  +  r-u,  R**)]  (52) 

The  same  argument  shows  that 
Vt{S  +  r  —  v,  x) 

=  min  Ei  [Vr{S  +  r  —  v  +  x  —  v1 ,  R*^)] 

v1  ES+r— v+x 

=  E[Cr(S  +  r-n,ir*)]  (53) 

by  the  induction  hypothesis  (applied  to  the  state 
(S  +  r  —  v,  x)).  Combining  these  facts,  we  get 

Vt(S  +  r  —  u,  x)  —  Vt{S  +  r  —  v,  x)  (54) 

=  E  [VT(S  +  r  —  u,  It*)]  -  E  [VT(S  +  r-v,  R**)] 

and  (50)  follows  by  invoking  the  induction  hypothesis 
once  more,  this  time  in  state  (S',  r). 

Case  2  -  Assume  v  <  x.  Then,  going  back  to  the  ex¬ 
pression  (51)  for  u  7^  v  in  S  +  r,  we  note  that  now  v  is 
the  smallest  element  of  S  +  r  —  u  +  x,  hence  achieves 
the  minimum  in  (51)  by  virtue  of  the  induction  hypothesis 
applied  to  the  state  (S  +  r  —  u,  x).  Therefore, 

Vt(S  +  r  —  u,  x) 

=  E  [VT(S  +  r-u  +  x-v,R**)] 

=  E  [VT(S  +  r-v  +  x-u,R**)]  (55) 

On  the  other  hand,  by  the  induction  hypothesis 
applied  to  the  state  (S  +  r  —  v,  x).  we  find  that 

Vt(S  +  r  —  v,  x) 

=  min  Ei  [Vr{S  +  r  —  v  +  x  —  v1 ,  R**)] 

v’^S+r— v+x  J 

=  E  [VT{S  +  r-v  +  x-v*,R**)]  (56) 

where  if  is  the  smallest  element  in  S  +  r  —  v  +  x.  Col¬ 
lecting  these  expressisons,  we  find 

Vt(S  +  r  —  u,x)  —  Vt(S  +  r  —  v,  x) 

=  E  [VT(S  +  r-v  +  x-u,R**)] 

-E[VT(S  +  r-v+x-v*,R**)]  (57) 

and  here  as  well  (50)  follows  by  invoking  the  induction 
hypothesis  once  more,  this  time  in  state  (S  +  r  —  v,x)  as 
we  note  that  any  u  /  v  in  S+r  is  necessarily  in  S  +  r  —  v. 
This  completes  the  proof  of  Theorem  1 . 
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